Evaluating Web Archive Search Systems

نویسندگان

  • Miguel Costa
  • Mário J. Silva
چکیده

The information published on the web, a representation of our collective memory, is rapidly vanishing. At least 77 web archives have been developed to cope with the web’s transience problem, but despite their technology having achieved a good maturity level, the retrieval effectiveness of the search services they provide still presents unsatisfactory results. In this work, we propose an evaluation methodology for web archive search systems based on a list of requirements compiled from previous characterizations of web archives and their users. The methodology includes the design of a test collection and the selection of evaluation measures to support realistic and reproducible experiments. The test collection enabled, for the first time, to measure the effectiveness of state-of-the-art IR technology employed in web archives. Results confirm the poor quality of search results retrieved with such technology. However, we show how to combine temporal features, along with the regular topical features, to improve the search effectiveness on web archives. The test collection is available to the research community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LNCS 8421 - Database Systems for Advanced Applications

Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own investigations have shown that IRstyle, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive response times as archives scale will be a challenge. We report here on our exploration of performance technique...

متن کامل

What Makes a Search Engine Good for Democracy? PUBLIC OPINION POLLING AND THE EVALUATION OF SOFTWARE

We propose one possible set of criteria for evaluating software – specifically search engines – according to their usefulness for deliberative democracy. We then describe a user study of the search capabilities of three, existing, online archives (Google Groups, Omgili, or Technorati) of threaded, conversational data. Our study measures the capabilities of these search engines according to the ...

متن کامل

The Conversion Software Registry

We have designed web based Conversion Software Registry (CSR) for collecting information about software that are capable of file format conversions. The work is motivated by a community need for finding file format conversions inaccessible via current search engines and by the specific need to support systems that could actually perform conversions, such as the NCSA Polyglot[2]. In addition, th...

متن کامل

The Journey is the Reward - Towards New Paradigms in Web Search

Without search engines the information content of the World Wide Web would remain largely closed for the ordinary user. Current web search engines work well as long as the user knows what she is looking for. The situation becomes problematic, if the user has insufficient expertise or prior knowledge to formulate the search query. Often a sequence of search requests is necessary to answer the us...

متن کامل

Adaptive Search Support for Information Seeking Stages

We use the Web for work, leisure, and research, assisted by various search systems in the task of satisfying our information needs. We utilize these systems to perform our daily tasks, ranging from simple lookup tasks to complex, exploratory and analytical ventures. The more complex tasks may involve multiple information seeking stages, with evolving inherent needs for each stage. Most search s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012